---
title: "Normal probabilities and quantiles"
format:
revealjs:
slide-number: true
embed-resources: true
editor: source
---
This program displays the standard normal curve along with various probabilities and quantiles. It was written by Steve Simon on 2024-09-01 and is placed in the public domain.
## Using R to draw the standard normal curve
use seq to calculate 100 evenly spaced values between -4 and +4 and dnorm to compute the bell curve at each point. Use geom_polygon to paint the area surrounded by the bell curve.
```{r standard-normal}
x <- seq(-4, 4, length=100)
y <- dnorm(x)
data.frame(x, y) |>
ggplot(aes(x, y)) +
geom_polygon(fill="white", color="black") -> normal_curve
normal_curve
```
simon-5501-03-normal-calculations.qmd, 4
## P[Z < 1.5]
Use geom_vline and geom_label to draw a vertical reference line and add text to the normal curve. The pnorm function computes the standard normal probability.
```{r prob-1}
a <- 1.5
normal_curve +
geom_vline(xintercept=a) +
geom_label(x=a, y=0.4, label=a) +
geom_label(x=a-0.5, y=0, label="Area = ?")
pnorm(1.5)
```
## P[Z > 1]
When you are calculating the probability on the right (probability greater than some number), use 1-pnorm.
```{r prob-3}
a <- 1
normal_curve +
geom_vline(xintercept=a) +
geom_label(x=a, y=0.4, label=a) +
geom_label(x=a+0.5, y=0, label="Area = ?")
1- pnorm(1)
```
## P[-2.5 < Z < 2.5]
When you are calculating the probability between two values, compute pnorm of the larger value minus pnorm of the smaller value.
```{r prob-5a}
a <- 2.5
normal_curve +
geom_vline(xintercept=-a) +
geom_vline(xintercept= a) +
geom_label(x=-a, y=0.4, label=-a) +
geom_label(x= a, y=0.4, label= a) +
geom_label(x=0, y=0, label="Area = ?")
pnorm(2.5)- pnorm(-2.5)
```
simon-5501-03-normal-calculations.qmd, 9
## P[-0.5 < Z < 0.5]
```{r prob-6}
a <- 0.5
normal_curve +
geom_vline(xintercept=-a) +
geom_vline(xintercept= a) +
geom_label(x=-a, y=0.4, label=-a) +
geom_label(x= a, y=0.4, label= a) +
geom_label(x=0, y=0, label="Area = ?")
pnorm(0.5) - pnorm(-0.5)
```
simon-5501-03-normal-calculations.qmd, 10
## 25th percentile of a standard normal
Use qnorm to calculate quantiles of the standard normal distribution.
```{r quantile-1}
p <- 0.25
a <- qnorm(p)
normal_curve +
geom_vline(xintercept=a) +
geom_label(x=a, y=0.4, label="Quantile = ?") +
geom_label(x=a-0.5, y=0, label=p)
qnorm(0.25)
```
simon-5501-03-normal-calculations.qmd, 11
## 90th percentile of a standard normal
```{r quantile-2}
p <- 0.9
a <- qnorm(p)
normal_curve +
geom_vline(xintercept=a) +
geom_label(x=a, y=0.4, label="Quantile = ?") +
geom_label(x=a-0.5, y=0, label=p)
qnorm(0.9)
```
Break #2
What you have learned
Normal probabilities and quantiles
What’s coming next
Assessing normality
Assessing normality
No variable follows a perfect normal distribution
But many are close
How to assess (approximate) normality
Histogram
Boxplot
Normal probability plot
Avoid formal tests of normality
Histogram
Peak in the middle
Roughly symmetric
Falls off exponentially
Warning!! Bar width can influence your interpretation
Try two or more bar widths
Sample Boxplot
What to look for in the boxplot
Median halfway between 25th and 75th percentile.
Whiskers are same size
Whiskers not too short, not too long
Constructing a normal probability plot, 1
Calculate rank
Divide by (n+1)
Compute corresponding normal percentiles
Compare these on a graph to the original data
Roughly straight line implies normality
Constructing a normal probability plot, 2
Use the rank function to assign 1 to the smallest value, 2 to the next smallest value, etc. up to n for the largest value.
copyright: >
The author of the jse article holds the copyright, but does not list
conditions under which it can be used. Individual use for educational
purposes is probably permitted under the Fair Use provisions of
U.S. Copyright laws.
description: >
Forced Expiratory Volume (FEV) in children. The data was collected
in Boston in the 1970s.
vars:
age:
scale: ratio
range: positive integer
unit: years
fev:
label: Forced Expiratory Volume
scale: ratio
range: positive real
unit: liters
Data dictionary for fev, 4
ht:
label: Height
scale: positive real
unit: inches
sex:
value:
0: Female
1: Male
smoke:
value:
0: Nonsmoker
1: Smoker
---
simon-5501-03-fev.qmd, 1
---
title: "Analysis of fev data"
format:
html:
embed-resources: true
editor: source
---
This program assesses the normality of variables in a study of pulmonary function in children. There is a [data dictionary][dd] that provides more details about the data. The program was written by Steve Simon on 2024-09-02 and is placed in the public domain.
[dd]: https://github.com/pmean/datasets/blob/master/fev.yaml
simon-5501-03-fev.qmd, 2
## Libraries
The tidyverse library is the only one you need for this program.
```{r setup}
#| message: false
#| warning: false
library(tidyverse)
```
simon-5501-03-fev.qmd, 3
## List variable names
Since the variable names are not listed in the data file itself, you need to list them here.
```{r names}
fev_names <- c(
"age",
"fev",
"ht",
"sex",
"smoke")
```
simon-5501-03-fev.qmd, 4
## Reading the data
Here is the code to read the data and show a glimpse.
```{r read}
fev <- read_csv(
file="../data/fev.csv",
col_names=fev_names,
col_types="nnncc")
glimpse(fev)
```
simon-5501-03-fev.qmd, 5
## Calculate mean and standard deviation for fev
To orient yourself to the data, calculate a few descriptive statistics.
```{r descriptive-fev}
fev |>
summarize(
fev_mean=mean(fev),
fev_stdv=sd(fev))
```
## Histogram for fev, narrow bars
```{r histogram-fev-narrow}
ggplot(data=fev, aes(x=fev)) +
geom_histogram(
binwidth=0.1,
color="black",
fill="white")
```
Although some may interpret these histograms as showing a slight skewness, I would interpret them as being approximately normal.
simon-5501-03-fev.qmd, 8
## Normal probability plot for fev
The qqnorm function produces a normal probability plot. The default option for most plots is landscape orientation (the width is larger than the height). The q-q plot, however, looks best if figure width and height are equal.
```{r qqplot-fev}
#| fig-width: 5
#| fig.height: 5
qqnorm(fev$fev)
```
The normal probability plot is reasonably close to a straight line, indicating that the data comes reasonably close to following a normal distribution.